{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A simple DNN model built in Keras.\n",
    "\n",
    "In this notebook, we will use the ML datasets we read in with our Keras pipeline earlier and build our Keras DNN to predict the fare amount for NYC taxi cab rides.\n",
    "\n",
    "### Learning objectives\n",
    "1. Review how to read in CSV file data using tf.data\n",
    "2. Specify input, hidden, and output layers in the DNN architecture\n",
    "3. Review and visualize the final DNN shape\n",
    "4. Train the model locally and visualize the loss curves\n",
    "5. Deploy and predict with the model using Cloud AI Platform \n",
    "\n",
    "Each learning objective will correspond to a __#TODO__ in this student lab notebook -- try to complete this notebook first and then review the [solution notebook](../solution/keras_dnn.ipynb). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install tensorflow==2.1 --user"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
     "Please ignore any compatibility warnings and errors.\n",
     "Make sure to <b>restart</b> your kernel to ensure this change has taken place."
  ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "export PROJECT=$(gcloud config list project --format \"value(core.project)\")\n",
    "echo \"Your current GCP Project Name is: \"$PROJECT\n",
    "gcloud config set ai_platform/region global"
  ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os, json, math\n",
    "import numpy as np\n",
    "import shutil\n",
    "import tensorflow as tf\n",
    "print(\"TensorFlow version: \",tf.version.VERSION)\n",
    "\n",
    "PROJECT = \"your-gcp-project-here\" # REPLACE WITH YOUR PROJECT NAME\n",
    "REGION = \"us-central1\" # REPLACE WITH YOUR BUCKET REGION e.g. us-central1\n",
    "\n",
    "# Do not change these\n",
    "os.environ[\"PROJECT\"] = PROJECT\n",
    "os.environ[\"REGION\"] = REGION\n",
    "os.environ[\"BUCKET\"] = PROJECT # DEFAULT BUCKET WILL BE PROJECT ID\n",
    "os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # SET TF ERROR LOG VERBOSITY\n",
    "\n",
    "if PROJECT == \"your-gcp-project-here\":\n",
    "  print(\"Don't forget to update your PROJECT name! Currently:\", PROJECT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "## Create GCS bucket if it doesn't exist already...\n",
    "exists=$(gcloud storage ls | grep -w gs://${PROJECT}/)\n",
    "\n",
    "if [ -n \"$exists\" ]; then\n",
    "   echo -e \"Bucket exists, let's not re-create it. \\n\\nHere are your buckets:\"\n",
    "   gcloud storage ls\n",
    "    \n",
    "else\n",
    "   echo \"Creating a new GCS bucket.\"\n",
    "   gcloud storage buckets create --location=${REGION} gs://${PROJECT}\n",
    "   echo \"\\nHere are your current buckets:\"\n",
    "   gcloud storage ls\n",
    "fi"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Locating the CSV files\n",
    "\n",
    "We will start with the CSV files that we wrote out in the [first notebook](../01_explore/taxifare.iypnb) of this sequence. Just so you don't have to run the notebook, we saved a copy in ../../data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls -l ../../data/*.csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use tf.data to read the CSV files\n",
    "\n",
    "We wrote these cells in the [third notebook](../03_tfdata/solution/input_pipeline.ipynb) of this sequence where we created a data pipeline with Keras.\n",
    "\n",
    "First let's define our columns of data, which column we're predicting for, and the default values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "CSV_COLUMNS  = ['fare_amount',  'pickup_datetime',\n",
    "                'pickup_longitude', 'pickup_latitude', \n",
    "                'dropoff_longitude', 'dropoff_latitude', \n",
    "                'passenger_count', 'key']\n",
    "# TODO 1: Specify the LABEL_COLUMN name you are predicting for below:\n",
    "LABEL_COLUMN = ''\n",
    "DEFAULTS     = [[0.0],['na'],[0.0],[0.0],[0.0],[0.0],[0.0],['na']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, let's define our features we want to use and our label(s) and then load in the dataset for training."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def features_and_labels(row_data):\n",
    "    for unwanted_col in ['pickup_datetime', 'key']:\n",
    "        row_data.pop(unwanted_col)\n",
    "    label = row_data.pop(LABEL_COLUMN)\n",
    "    return row_data, label  # features, label\n",
    "\n",
    "# load the training data\n",
    "def load_dataset(pattern, batch_size=1, mode=tf.estimator.ModeKeys.EVAL):\n",
    "  dataset = (\n",
    "              # TODO 1: Complete the four tf.data.experimental.make_csv_dataset options\n",
    "              # Choose from and correctly order: batch_size, CSV_COLUMNS, DEFAULTS, pattern\n",
    "              tf.data.experimental.make_csv_dataset() # <--- fill-in options\n",
    "              .map(features_and_labels) # features, label\n",
    "             )\n",
    "  if mode == tf.estimator.ModeKeys.TRAIN:\n",
    "        dataset = dataset.shuffle(1000).repeat()\n",
    "  dataset = dataset.prefetch(1) # take advantage of multi-threading; 1=AUTOTUNE\n",
    "  return dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build a DNN with Keras\n",
    "\n",
    "Now let's build the Deep Neural Network (DNN) model in Keras and specify the input and hidden layers. We will print out the DNN architecture and then visualize it later on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## Build a simple Keras DNN using its Functional API\n",
    "def rmse(y_true, y_pred):\n",
    "    return tf.sqrt(tf.reduce_mean(tf.square(y_pred - y_true))) \n",
    "\n",
    "def build_dnn_model():\n",
    "    # TODO 2: Specify the five input columns\n",
    "    INPUT_COLS = []\n",
    "\n",
    "    # input layer\n",
    "    inputs = {\n",
    "        colname : tf.keras.layers.Input(name=colname, shape=(), dtype='float32')\n",
    "           for colname in INPUT_COLS\n",
    "    }\n",
    "    feature_columns = {\n",
    "        colname : tf.feature_column.numeric_column(colname)\n",
    "           for colname in INPUT_COLS\n",
    "    }\n",
    "    \n",
    "    # the constructor for DenseFeatures takes a list of numeric columns\n",
    "    # The Functional API in Keras requires that you specify: LayerConstructor()(inputs)\n",
    "    dnn_inputs = tf.keras.layers.DenseFeatures(feature_columns.values())(inputs)\n",
    "\n",
    "    # two hidden layers of [32, 8] just in like the BQML DNN\n",
    "    # TODO 2: Create two hidden layers [32,8] with relu activation. Name them h1 and h2\n",
    "    # Tip: Start with h1 = tf.keras.layers.dense\n",
    "    h1 = # complete\n",
    "    h2 = # complete\n",
    "\n",
    "    # final output is a linear activation because this is regression\n",
    "    # TODO 2: Create an output layer with linear activation and name it 'fare'\n",
    "    output = \n",
    "    \n",
    "    # TODO 2: Use tf.keras.models.Model and create your model with inputs and output\n",
    "    model = \n",
    "    \n",
    "    model.compile(optimizer='adam', loss='mse', metrics=[rmse, 'mse'])\n",
    "    return model\n",
    "\n",
    "print(\"Here is our DNN architecture so far:\\n\")\n",
    "model = build_dnn_model()\n",
    "print(model.summary())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize the DNN\n",
    "\n",
    "We can visualize the DNN using the Keras [plot_model](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/utils/plot_model) utility."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO 3: Use tf.keras.utils.plot_model() to create a dnn_model.png of your architecture\n",
    "# Tip: For rank direction, choose Left Right (rankdir='LR')\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Train the model\n",
    "\n",
    "To train the model, simply call [model.fit()](https://keras.io/models/model/#fit).\n",
    "\n",
    "Note that we should really use many more NUM_TRAIN_EXAMPLES (i.e. a larger dataset). We shouldn't make assumptions about the quality of the model based on training/evaluating it on a small sample of the full data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "TRAIN_BATCH_SIZE = 32\n",
    "NUM_TRAIN_EXAMPLES = 10000 * 5 # training dataset repeats, so it will wrap around\n",
    "NUM_EVALS = 5  # how many times to evaluate\n",
    "NUM_EVAL_EXAMPLES = 10000 # enough to get a reasonable sample, but not so much that it slows down\n",
    "\n",
    "trainds = load_dataset('../../data/taxi-train*', TRAIN_BATCH_SIZE, tf.estimator.ModeKeys.TRAIN)\n",
    "evalds = load_dataset('../../data/taxi-valid*', 1000, tf.estimator.ModeKeys.EVAL).take(NUM_EVAL_EXAMPLES//1000)\n",
    "\n",
    "steps_per_epoch = NUM_TRAIN_EXAMPLES // (TRAIN_BATCH_SIZE * NUM_EVALS)\n",
    "\n",
    "# TODO 4: Pass in the correct parameters to train your model\n",
    "history = model.fit(\n",
    "                \n",
    "                    )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize the model loss curve\n",
    "\n",
    "Next, we will use matplotlib to draw the model's loss curves for training and validation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot\n",
    "import matplotlib.pyplot as plt\n",
    "nrows = 1\n",
    "ncols = 2\n",
    "fig = plt.figure(figsize=(10, 5))\n",
    "\n",
    "for idx, key in enumerate(['loss', 'rmse']):\n",
    "    ax = fig.add_subplot(nrows, ncols, idx+1)\n",
    "    plt.plot(history.history[key])\n",
    "    plt.plot(history.history['val_{}'.format(key)])\n",
    "    plt.title('model {}'.format(key))\n",
    "    plt.ylabel(key)\n",
    "    plt.xlabel('epoch')\n",
    "    plt.legend(['train', 'validation'], loc='upper left');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Predict with the model locally\n",
    "\n",
    "To predict with Keras, you simply call [model.predict()](https://keras.io/models/model/#predict) and pass in the cab ride you want to predict the fare amount for."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.predict({\n",
    "    'pickup_longitude': tf.convert_to_tensor([-73.982683]),\n",
    "    'pickup_latitude': tf.convert_to_tensor([40.742104]),\n",
    "    'dropoff_longitude': tf.convert_to_tensor([-73.983766]),\n",
    "    'dropoff_latitude': tf.convert_to_tensor([40.755174]),\n",
    "    'passenger_count': tf.convert_to_tensor([3.0]),    \n",
    "}, steps=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, this is not realistic, because we can't expect client code to have a model object in memory. We'll have to export our model to a file, and expect client code to instantiate the model from that exported file."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Export the model for serving\n",
    "\n",
    "Let's export the model to a TensorFlow SavedModel format. Once we have a model in this format, we have lots of ways to \"serve\" the model, from a web application, from JavaScript, from mobile applications, etc."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import shutil, os, datetime\n",
    "OUTPUT_DIR = './export/savedmodel'\n",
    "shutil.rmtree(OUTPUT_DIR, ignore_errors=True)\n",
    "EXPORT_PATH = os.path.join(OUTPUT_DIR, datetime.datetime.now().strftime('%Y%m%d%H%M%S'))\n",
    "tf.saved_model.save(model, EXPORT_PATH) # with default serving function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!saved_model_cli show --tag_set serve --signature_def serving_default --dir {EXPORT_PATH}\n",
    "!find {EXPORT_PATH}\n",
    "os.environ['EXPORT_PATH'] = EXPORT_PATH"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy the model to AI Platform\n",
    "\n",
    "Next, we will use the `gcloud ai-platform` command to create a new version for our __taxifare__ model and give it the version name of __dnn__. \n",
    "\n",
    "Deploying the model will take 5 - 10 minutes. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "PROJECT=${PROJECT}\n",
    "BUCKET=${BUCKET}\n",
    "REGION=${REGION}\n",
    "MODEL_NAME=taxifare\n",
    "VERSION_NAME=dnn\n",
    "\n",
    "if [[ $(gcloud ai-platform models list --format='value(name)' | grep $MODEL_NAME) ]]; then\n",
    "    echo \"The model named $MODEL_NAME already exists.\"\n",
    "else\n",
    "    # create model\n",
    "    echo \"Creating $MODEL_NAME model now.\"\n",
    "    gcloud ai-platform models create --regions=$REGION $MODEL_NAME\n",
    "fi\n",
    "\n",
    "if [[ $(gcloud ai-platform versions list --model $MODEL_NAME --format='value(name)' | grep $VERSION_NAME) ]]; then\n",
    "    echo \"Deleting already the existing model $MODEL_NAME:$VERSION_NAME ... \"\n",
    "    gcloud ai-platform versions delete --model=$MODEL_NAME $VERSION_NAME\n",
    "    echo \"Please run this cell again if you don't see a Creating message ... \"\n",
    "    sleep 2\n",
    "fi\n",
    "\n",
    "# create model\n",
    "echo \"Creating $MODEL_NAME:$VERSION_NAME\"\n",
    "\n",
    "# TODO 5: Create the model using gcloud ai-platform predict\n",
    "# Refer to: https://cloud.google.com/sdk/gcloud/reference/ai-platform/predict\n",
    "gcloud ai-platform versions create # complete the missing parameters"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Monitor the model creation at [GCP Console > AI Platform](https://console.cloud.google.com/mlengine/models/taxifare/) and once the model version `dnn` is created, proceed to the next cell.\n",
    "\n",
    "### Predict with model using `gcloud ai-platform predict`\n",
    "\n",
    "To predict with the model, we first need to create some data that the model hasn't seen before. Let's predict for a new taxi cab ride for you and two friends going from [from Kips Bay and heading to Midtown Manhattan](https://www.google.com/maps/dir/40.742104,-73.982683/'40.755174,-73.983766'/@40.7487493,-73.9892016,16z/data=!3m1!4b1!4m6!4m5!1m0!1m3!2m2!1d-73.983766!2d40.755174) for a total distance of 1.3 miles. How much would that cost?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%writefile input.json\n",
    "{\"pickup_longitude\": -73.982683, \"pickup_latitude\": 40.742104,\"dropoff_longitude\": -73.983766,\"dropoff_latitude\": 40.755174,\"passenger_count\": 3.0}  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!gcloud ai-platform predict --model taxifare --json-instances input.json --version dnn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the [next notebook](../05_feateng), we will improve this model through feature engineering."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2022 Google Inc.\n",
    "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
    "http://www.apache.org/licenses/LICENSE-2.0\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
